Project-Team:SIERRA

Inria | Raweb 2015 | Presentation of the Project-Team SIERRA | SIERRA Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Choice of $V$ for $V$ -Fold Cross-Validation in Least-Squares

Participant : Sylvain Arlot [correspondent] .

Collaboration with Matthieu Lerasle.

The paper [30] studies $V$ -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing $V$ in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for $V$ -fold cross-validation and its bias-corrected version ( $V$ -fold penalization). In particular, this result implies that $V$ -fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of $V$ -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on $V$ like $1 + 4 / (V - 1)$ , at least in some particular cases, suggesting that the performance increases much from $V = 2$ to $V = 5$ or 10, and then is almost constant. Overall, this can explain the common advice to take $V = 5$ —at least in our setting and when the computational power is limited—, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter $V$ is replaced by the number B of random splits of the data.

Previous |

Home | Next next

Section: New Results

Choice of V for V-Fold Cross-Validation in Least-Squares

Choice of $V$ for $V$ -Fold Cross-Validation in Least-Squares